The Better Predictive Model: High q for the Training Set or Low Root Mean Square Error of Prediction for the Test Set?

نویسندگان

  • Aynur O. Aptula
  • Nina G. Jeliazkova
  • Terry W. Schultz
  • Mark T. D. Cronin
چکیده

The process of validation of computational models (e.g., QSARs) may become the most important step in their development. Different requirements for the reliability and predictability of QSAR models have been described in the literature. Despite these formal recommendations there are few practical rules as to when to cease adding variables to a QSAR (i.e., what is an appropriate level of complexity of the model). In this work the influence of model complexity to statistical fit and error have been investigated using toxicity data for 200 phenols to the ciliated protozoan Tetrahymena pyriformis when applying a test set of a further 50 compounds. The results from this investigation showed that some important factors play a role in the definition of a good and reliable QSAR. These include the fact that q is not a good criterion for a model predictivity; that outliers should not necessarily be deleted as this may reduce the chemical space of the model; the number of descriptors in a multivariate model should be chosen carefully to avoid model underand over-estimation; and that an appropriate number of dimensions is required for PLS modelling.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Artificial Neural Network Modeling for Predicting of some Ion Concentrations in the Karaj River

The water quality of the Karaj River was studied through collecting 2137 experimental data set gained by 20 sampling stations. The data included different parameters such as T (temperature), pH, NTU (turbidity), hardness, TDS (total dissolved solids), EC (electrical conductivity) and basic anion, cation concentrations. In this study a multi-layer perceptron artificial neural network model was d...

متن کامل

Global Solar Radiation Prediction for Makurdi, Nigeria Using Feed Forward Backward Propagation Neural Network

The optimum design of solar energy systems strongly depends on the accuracy of  solar radiation data. However, the availability of accurate solar radiation data is undermined by the high cost of measuring equipment or non-functional ones. This study developed a feed-forward backpropagation artificial neural network model for prediction of global solar radiation in Makurdi, Nigeria (7.7322  N lo...

متن کامل

Quantitative Modeling for Prediction of Critical Temperature of Refrigerant Compounds

The quantitative structure-property relationship (QSPR) method is used to develop the correlation between structures of refrigerants (198 compounds) and their critical temperature. Molecular descriptors calculated from structure alone were used to represent molecular structures. A subset of the calculated descriptors selected using a genetic algorithm (GA) was used in the QSPR model development...

متن کامل

The CFD Provides Data for Adaptive Neuro-Fuzzy to Model the Heat Transfer in Flat and Discontinuous Fins

In the present study, Adaptive Neuro–Fuzzy Inference System (ANFIS) approach was applied for predicting the heat transfer and air flow pressure drop on flat and discontinuous fins. The heat transfer and friction characteristics were experimentally investigated in four flat and discontinuous fins with different geometric parameters including; fin length (r), fin interruption (s), fin pitch (p), ...

متن کامل

Bayesian prediction of rotational torque to operate horizontal drilling

Horizontal directional drilling is usually used in drilling engineering. In a variety of conditions, it is necessary to predict the torque required for performing the drilling operation. Nevertheless, there is presently not a convenient method available to accomplish this task. In order to overcome this difficulty, the current work aims at predicting the required rotational torque (RT) to opera...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005